igillllllllllllllllHIllEH 



United States Patent [19] 

Jeffries et al. 



US006035333A 
[ii] Patent Number: 
[45] Date of Patent: 



6,035,333 
Mar. 7, 2000 



[54] METHOD AND SYSTEM FOR PROVIDING 
CONGESTION CONTROL IN A DATA 
COMMUNICATIONS NETWORK 

[75] Inventors: Clark Debs Jeffries, Clemson, S.C.; 

Anoop Ghanwani, Durham, N.C.; 
Gerald Arnold Marin, Chapel Hill, 
N.C.; Ken Van Vu, Cary, N.C. 

[73] Assignee: International Business Machines 
Corporation, Armonk, N.Y. 

[21] Appl. No.: 08/977,252 
[22] Filed: Nov. 24, 1997 

[51] Int. CI. 7 G06F 15/173 

[52] U.S. CI 709/224; 709/223; 709/235; 

370/445 

[58] Field of Search 709/253, 200, 

709/250, 235, 223, 224; 370/448, 445, 

446 

[56] References Cited 

U.S. PATENT DOCUMENTS 



5,650,997 7/1997 Yang et al 370/448 

5,852,723 12/1998 Kalkunte et al 709/235 

5,905,870 5/1999 Mangin et al 709/234 

5,936,962 8/1999 Haddock et al 370/446 

5,940,399 8/1999 Weizman 370/445 

FOREIGN PATENT DOCUMENTS 

0632620 A2 6/1994 European Pat. Off H04L 12/40 



98/07257A1 8/1996 WIPO H04L 12/413 

OTHER PUBLICATIONS 

Ren, et al. Flow Control and Congestion Avoidance in 
Switched Ethernet LANs, IEEE, pp. 508-512, Jun. 1997. 

Primary Examiner — Zarni Maung 
Assistant Examiner — William Titcomb 
Attorney, Agent, or Firm— Gerald R. Woods 

[57] ABSTRACT 

A bin packing algorithm is employed to schedule computer 
network activities, such as pause times required for opera- 
tion of an Ethernet network which implements existing 
IEEE 802. 3x standards. In such a network, any node in the 
network can control the flow of traffic from upstream 
stations in order to avoid congestion at the flow-controlling 
node. Upon sensing congestion, the flow-controlling node 
determines how long each upstream node contributing to the 
congestion should pause transmission over the next control 
interval. In accordance with the invention, the pause times 
are scheduled or staggered by using the bin packing algo- 
rithm to sort the sources into one or more bins. One required 
bin property is that there is no overlap in pause times for the 
sources within a particular bin. Another required bin prop- 
erty is that the sum of the pause times within a bin can be 
no greater than the length of the control interval. In a 
preferred embodiment, the calculated pause times are sepa- 
rated into different groups having different number ranges 
and no more than one pause time is selected from any 
particular group for packing into a given bin. 

5 Claims, 6 Drawing Sheets 
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METHOD AND SYSTEM FOR PROVIDING 
CONGESTION CONTROL IN A DATA 
COMMUNICATIONS NETWORK 

FIELD OF THE INVENTION 

The present invention relates to data communications 
networks and more particularly to a method and system for 
providing congestion control in such a network. 

BACKGROUND OF THE INVENTION 

When data processing systems first were used commer- 
cially on a widespread basis, the standard system configu- 
ration was an autonomous mainframe or host computer 
which could be accessed only through locally-attached ter- 
minals. Few people, at that time, perceived any significant 
benefit from interconnecting host computers. 

Over time, it came to be understood that significant 
commercial advantages could be gained by interconnecting 
or networking host computers. Data originating with users at 
one host computer could readily and rapidly shared with 
users located at other host computers, even where those 
other host computers were many miles away. Also, the 
functional capabilities of a given host computer could be 
treated as a resource that could be shared not only among 
locally-attached users but also among remote, network- 
attached users. Mainframe networks of this type came to be 
generically referred to as Wide Area Networks, commonly 
abbreviated to WANs. 

Certain parallels exist between the development of main- 
frame computer technology and the later development of 
personal computer technology. Early personal computers 
were relatively unsophisticated devices intended for use by 
a single user in a standalone configuration. Eventually, the 
same kinds of needs (data sharing and resource sharing) that 
drove the development of mainframe networks began to 
drive the development of networks of personal computers 
and auxiliary devices, such as printers and data storage 
devices. While mainframe networks developed primarily 
using point-to-point connections among widely-separated 
mainframes, personal computer networks developed using 
shared or common transmission media to interconnect per- 
sonal computers and auxiliary devices within a 
geographically-limited area, such as a building or even an 
area within a building. Networks of this type came to be 
generically referred to as Local Area Networks or LANs. 

Different LAN technologies exist. Currently, the most 
popular LAN technology is Ethernet technology. In an 
Ethernet LAN, personal computers and auxiliary devices 
share a common bi-directional data bus. In the following 
description, LAN-attached devices will be generically 
referred to as stations or LAN stations. Any transmission- 
capable LAN station may initiate transmission on the bus 
and every transmission propagates in both directions and is 
received by every LAN station attached to the same bus, 
including the transmitting station. 

Because several LAN stations can attempt to claim the 
bus at the same time, a Collision Sense Multiple Access/ 
Carrier Detect (CS MA/CD) protocol exists to resolve con- 
flicts among contending users. The protocol is relatively 
simple. When a station has data to transmit, it "listens" to the 
bus to see if the bus is already carrying data from another 
station. If the bus is found not to be in use, the listening 
station begins its own transmission immediately. If the bus 
is found to be in use, the station with data to send waits for 
a predetermined interval before restarting the bus acquisition 
process. 
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Since electrical signals require time to propagate down 
any conductor, two or more stations can listen, find the bus 
quiet at the time, and begin transmitting simultaneously. If 
that happens, data from the transmitting stations collide and 

5 becomes corrupted. If a transmitting station doesn't detect 
the same data it transmitted, that station sends a short 
jamming signal and stops transmitting. The jamming signal 
increases the chances that all other transmitting stations will 
detect the collision and stop transmitting themselves. Fol- 
io lowing a random delay, each transmitting station restarts the 
bus acquisition process. 

The same user needs (data sharing and resource sharing) 
which drove the development of Ethernet LANS have driven 
the creation of Ethernet networks consisting of multiple 

35 Ethernet LANs interconnected through boundary devices 
known as LAN bridges or switches. Point-to-point connec- 
tions or links between LAN switches permit traffic originat- 
ing in any given Ethernet LAN to be transported to a LAN 
station connected to any other LAN in the same switched 

20 Ethernet network. A given switch-to-switch link typically 
carries traffic from multiple sources concurrently. Although 
Ethernet was originally developed as a shared-media LAN 
technology, "switched" Ethernet technology is being devel- 
oped to support full duplex links. 

25 The CSMA/CD protocol, while providing a fairly effec- 
tive flow control mechanism within a single shared Ethernet 
LAN, is ineffective in controlling flow (preventing 
congestion) on switched links. To provide flow control on 
switch-to-switch links, at least one standards group, the 

30 IEEE 802.3 working group, has developed a flow control 
standard (IEEE 802.3x) for such links. Under the standard 
(as currently developed), a station that wants to inhibit 
transmission of data from one or more upstream stations on 
the network generates a pause frame which contains, among 

35 other things, a pause time and a reserved multicast address. 
Pause times are expressed as a number of time slots, with a 
time slot being the time required to transmit a sixty-four byte 
packet on the link. While IEEE 802.3x flow control was 
designed primarily to overcome the lack of flow control on 

40 full duplex Ethernet links, it can also be used on shared 
segments. 

A link-controlling station, such as a bridge, can respond to 
the reserved multicast address to pause traffic on the entire 

45 link for the specified pause time. In theory only, it is possible 
to send a pause frame with a destination address which 
identifies a specific LAN station instead of a link-controlling 
station. In current practice, it is a violation of the standard to 
generate a pause frame having anything other than a 

5Q reserved multicast address as the destination address. 

A general problem with the approach currently defined in 
the standard is that flow control is performed on a link level 
rather than on a per-station level. As a consequence, a flow 
control initiated to deal with a congested path may end up 

55 interfering with flow along uncongested paths as well. 

Extending the standard to permit flow control signals to 
be addressed to individual stations solves some, but not all, 
problems. The length of time between successive pause 
frames can be defined as a control interval. If a flow- 

60 controlling station generates pause frames requiring that n 
different traffic sources pause at the beginning of a control 
interval, then all of the traffic sources will pause at substan- 
tially the same time, reducing the number of active connec- 
tions on a controlled link by n. However, as the traffic 

65 sources complete their individual pause times, multiple 
sources may resume sending at substantially the same time. 
As a consequence of this unintended "synchronization" of 
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traffic actions at individual sources, the traffic rate at the device 16 must, of course, be capable of receiving data 

flow-controlling station may oscillate between too little traffic from other nodes over a set of input links, represented 

traffic (at the onset of source pause times) and too much collectively by arrow 18, and of routing or switching that 

traffic (as pause times for multiple stations are completed at data traffic to an output links selected from a set of output 

or about the same time). 5 links > represented collectively by arrow 20. Typically, a 

switching device in a node must also be able to handle 

SUMMARY OF THE INVENTION locally-originated data traffic, such as might be provided by 

m ....... , . workstations 22 and 24 attached directly to switching device 

The present invention solves the above-discussed prob- 16 or by a local area netW0fk 26 Ffom the perspec tive of 

lem by distributing pause times over the control interval. stations 0Q the local area network 2 6, such as workstations 

The invention can be implemented as a method performed at 28> 3Q and ^ me switching device 16 takes on the appear . 

any station responsible for controlling the flow of traffic on ance of anolher stalion . 

a link between switching devices in a data communication Refcrri {Q F , G 3 a Qode M wMch rforms data 

network. Traffic at the station is monitored to detect the routin ^ switchi DecesS arily acts as a concentrator for data 

onset of congestion. When congestion is detected, traffic ^ ori ^ nati at multi k mde pe n dent traffic sources 

sources contributing to the congestion are identified in a " ( eQted b stations 36fl> Ub> 36c and Md) M 

defined set A pause time * established for each of the ide ^ ^ tQ ^ node ovef connections made 

sources included in the set. Then the start of the estabhshed ( h &Q intervenin network) represented only as a net . 

pause time for each source in the set is scheduled using an wQrk ^ 3g and a ^ of m links 4Q tQ ^ node 34 

algorithm which attempts to spread the pause times over the Recause me ^ ^ ind deDt of one another> 

duration of the control interval. The number of stations ^ ibilit exists that the sources willj at some int 

paused at any given time is minimized while continuing to dufm tfaeir nQrmal atioQ> t t0 { more {raffic than 

satisfy the requirement that each station be silent for its {h& node caQ handle mcurri unacceptable delays 

assigned pause time during the control interval. of {qs&cs a node whicfa fe rcceiymg more ^ ^ Q [{ caQ 

BRIEF DESCRIPTION OF THE DRAWINGS 25 C0 P e with ia an acceptable manner is said to be in a 

congested condition. The present invention reduces the 

While the specification concludes with claims particularly chances that a node will be driven into a congested condition 

pointing out and distinctly claiming that which is regarded by controlling the flow of traffic from those upstream 

as the present invention, details of a preferred embodiment stations that are the source of the congestion, 

of the invention may be more readily ascertained from the 3Q In a net work conforming to the IEEE 802.3x proposed 

following detailed description when read in conjunction standard, traffic management can be performed in the kind of 

with the accompanying drawings wherein: node 42 shown in block diagram form in FIG. 4. Such a node 

FIG. 1 is an illustration of a mesh-connected network in can be characterized as a specialized form of data processing 

which the present invention is implemented; system, like any data processing system, the node 42 

FIG. 2 is a more detailed representation of a typical 35 includes a node processor 44, a memory system 46 and an 

switching point in the illustrated network; operating system 48. To support routing and switching 

FIG. 3 illustrates the general environment for the present operations, node 42 also includes a set of input buffers 50 for 

invention* temporarily storing data arriving over different connections 

FIG. 4'iUustrates the major functional components of a ° n j he links > a , swilch fabric ' a ■ of 0Ut P ut 

node in which the present invention is implemented; « buffers for temporarily storing switched data traffic until it 



FIG. 5 shows the structure of a pause control frame; 



DETAILED DESCRIPTION 



can be transmitted onto output links from the node. 

m ^ , . . „ ^ A , ^ , , . In a preferred embodiment, the invention is implemented 

FIG. 6 consisting of FIGS. 6A and 6B taken together, is as a s rformed b a com ter app i ication program 
a flow chart of the inventive process for scheduling pause 56 e ^ ecutin F in the node ^ 56 mcludes a 

times to provide effective utilization of available bandwidth; f . Z . ce n ^ t ma „„ M „ t „, 

r 45 congestion monitor component 5o, a pause time generator 

FIG. 7 is a representation of the packing of pause times as 60 and a pause time scheduler 62. The congestion monitor 

a result of several iterations of the process described with 58 may use any su i ta ble technique to provide an indication 

reference to FIG. 6; and 0 f con gestion at the node. One commonly employed tech- 

FIG. 8 is a chart of scheduled transmission and pause nique to monitor the occupancy rate of the output buffers 54. 

times for each of the traffic sources in the assumed set. 50 A threshold occupancy level is defined. As long as the 

occupancy level of a buffer remains below this threshold, the 
connection is considered to be congestion -free and no flow 

FIG. 1 is a generic representation of a data communica- control actions are performed. If, however, the buffer 

tion network which supports communications between becomes loaded beyond the threshold occupancy level, the 

remote users, represented by stations 10 and 12. In such a 55 connection is considered to be congested and flow control 

network, data originating at one of the stations reaches the operations may be initiated. 

other after traversing an intervening network generically As noted earlier, a network conforming to IEEE 802.3x 

represented as a number of mesh -connected data systems or requirements performs flow control by calculating pause 

nodes, typified by node 14. The configuration and function- times for upstream traffic sources. The general format of an 

ality of the nodes will vary as a function of the networking 60 IEEE 802.3x pause control frame is shown in FIG. 5. The 

protocols implemented in the network. For purposes of the frame includes a six byte Destination Address field which 

present invention, a node which implements the present may contain either a reserved multicast address or the 

invention must be capable of handling traffic from multiple address of a particular upstream station. An upstream node 

sources concurrently and of routing or switching traffic from recognizing the reserved multicast address will provide flow 

any of a set of input links to any of a set of output links. 65 control at a link level. An upstream node recognizing the 

Referring to FIG. 2, a node is not necessarily simply a address of a particular upstream station within its domain 

pass-through device connecting other nodes. A switching will provide flow control at a station level. 
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1000 
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11 
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0100 
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0111 
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0101 
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0100 
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0101 
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0011 
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0011 
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0010 
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0011 
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0010 
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0001 
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0001 
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The pause control frame further includes a six byte Source 
Address field which identifies the downstream station per- 
forming the flow control operations, a two byte Length/Type 
field which is encoded with a value which identifies the 

frame as a control frame, a two byte MAC Control OpCode 5 
field which is encoded to identify the frame as a PAUSE 
frame, and a two byte MAC Control Parameters field which 
specifies the pause time for the affected station or link. To 
bring the frame into conformance with Ethernet protocols 

implemented in an IEEE 802.3x system, the frame is padded 10 
with sufficient non-data characters to increase the frame 
length to the minimum length of sixty-four bytes required 
for Ethernet frames. 

The premise underlying the present invention is that even 
where multiple upstream stations must be directed to remain 
silent for specified pause times during a predetermined 
control interval, it is advantageous to schedule or stagger 
those pause times within the control interval in a way which 
minimizes the number of stations that are paused or silent at 
any given time while continuing to satisfy the requirement 

that each station remain silent for at least its scheduled pause The A° w cnart i° F I( J- & defines a preferred embodiment 

time during the control interval. The scheduling of pause of a process for scheduling pause times in order to minimize 

times is performed using an algorithm known generally as a the number of stations that are silent at any given time 

bin packing algorithm which is executed at the beginning of during lhe ?° ntro1 interval - ^ objective of the process is to 

each control interval. The initial data for the calculations 25 distribute the pause times into different bins B t , where i is 

consistsofasetofpausetimesgeneratedforupstreamnodes assi S ned dunn S the P ackm S P rocess " In a Preferred 

based on detection of congested connections at the flow embodiment, a group coefficient k* an integer value can be 

controlling node. The details of the congestion detection or assi S ned t0 each of tne ^ nu , mber S™^ 5 m i° ^ the 

pause times calculations are not necessary for an under- P ause u L mes have been divided. A group coefficient deter- 

standing of the present invention, which deals with the 30 m f es * he maximum number of pause times that can be 

scheduling of the pause times, no matter how generated. selected from L an y Particular number group during a smgle 

- - . iteration of the process. 

To illustrate a preferred form of the bin packing ^ initial step 64 in the process ^ for any previous i y . 

algorithm, the following set of arbitrarily generated pause calculated pause times to be loaded into a set I at the 

times is assumed to exist at the beginning of a control 35 begmrimg of each contro i interval While, for purposes of 

interval having a duration of fifteen time slots: illustration, the existence of a set of previously-calculated 

8,4,3,1,11,7,2,1,9,5,3,6,2,3,5,4 and 5. pause times is being assumed, such pause times exist only 

To simplify the explanation, the number of pause times in where congestion conditions are encountered. To determine 

this set and the duration of the control interval are kept low. whether congestion control actions are even necessary, a 

In practice the number of pause times and the duration of the 40 threshold determination 65 is made whether set I is empty, 

control interval might be considerably larger than the num- If set I is empty, congestion control actions are not required 

bers to be used here for illustration, depending on the size of and the process ends. Assuming, however, that set I contains 

the network being controlled. one or more pause time values, the process continues with 

If the duration of the control interval is fifteen time slots, a test 66, which determines whether the sum of all the pause 

the acceptable range of pause times is from one to fifteen 45 times in set I is less than the length T of the control interval, 

time slots. Decimal pause time values in this range can be If the results of test 66 are positive, all of the pause times are 

expressed as four bit binary words. In accordance with the packed into a single bin in step 68 and the process ends, 

invention, each pause time within the specified range is For the assumed set of pause times, however, the com- 

assigned to one of four different number groups as set forth bined sum far exceeds the assumed control interval of fifteen 

in the following table, depending entirely on the most 50 time slots and the process continues to step 70 where the 

significant bit position in the word having a binary "1" contents of set I are copied into a working set C A bin B,-, 

value. initially B lT is initialized to empty in step 72 and the pause 

times in the highest-valued number group are grouped into 
a first list in an operation 74. Since there is no guarantee that 
55 there will be any pause times at all in the highest-valued 
number group, a test 76 must be performed even on the first 
iteration of the process to determine whether the first list is 
really empty. 

If there is at least one pause time in the first list, that pause 
60 time is packed into bin B x in an operation 78. By definition, 
the maximum pause time value cannot exceed the length of 

Using the criteria set forth above, the set of pause times the control interval T, meaning that at least one pause time 

is regrouped as indicated in the following table. Note that the from the highest-valued group of pause times will fit into an 

pause times are not necessarily reordered in descending empty bin. 

value since the grouping is not based on decimal values but 65 After a pause time from the highest-valued group is 

rather on most significant bit position containing a binary packed into bin B or it is found that the highest-valued group 

"1" value. list is really empty, the next number group is selected (step 
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8-15 
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2-3 
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0-1 
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80) and a list is generated of any pause times in working set 
C falling within that number group. At this point, a variable 
m is initialized to zero in an operation 81. The variable m is 
employed in a counting routine which controls the maxi- 
mum number of attempts that will be made to select a 
number from a particular number group during the process 
of packing each bin. In the same operation, the coefficient k ; 
previously assigned to the group is retrieved for use in the 
same counting routine. As will be clearer from the descrip- 
tion below, each number group will be considered a maxi- 
mum of k y times during the packing of each bin. 

Since the possibility exists that there may be no pause 
times that meet the criteria for a particular number group, a 
test 82 is needed to determine whether the list is empty. 
Assuming the list includes at least one pause time, the 
variable m is incremented by one and the value of the first 
pause time on the list is combined in an operation 84 with 
the value of any pause times previously packed into the 
current. The resulting sum is tested against the length of the 
control interval T in an operation 86. If the sum obtained in 
step 86 is greater than T, the selected pause time remains in 
working set C. If, however, the sum is less than or equal to 
T, the selected pause time is packed into the bin in an 
operation 88. 

A check 90 is then made to determine whether the variable 
m, incremented previously in step 83, is less than the group 
coefficient k y for the selected number group. If m is less than 
k ; ., the process loops back to step 82 at which a determination 
is made whether any pause time values remain on the list 
under consideration. Steps 82, 83, 84, 86, 88 and 90 define 
a program loop which repeats either until the current list is 
found to be empty in step 82 or until m has been incremented 
into equality with k ; for the number group being processed. 
Thus, if k 7 =3, the loop could be repeated up to three times, 
being "broken" earlier only if the current list is found to be 
empty in step 82. 

When either of the specified exit conditions is found, the 
process loops to a test 92 which determines whether all of 
the number groups have been processed. Assuming a nega- 
tive result to test 92, the process moves on to the next 
lower-valued number group. 

The described steps are repeated for the pause times 
falling in each succeeding number group until the test 92 
indicates that every group has been processed. At this point, 
the bin is as packed as it is going to get using the indicated 
process. All pause times packed into the current bin B, are 
removed from set I in an operation 94 and a new (empty) bin 
is selected by incrementing i in an operation 94. The entire 
process is re-started at operation 65, which determines 
whether any pause times remain in set I. 

Since the size of set I is reduced as each successive bin is 
packed during iterations of the process, the set I will 
eventually either be empty (as indicated by a positive result 
to test 65) or become small enough to allow all remaining 
pause times to be packed into a single bin without going 
through a group-by-group process. The process is halted 
once any remaining pause times are transferred to the bin in 
operation 68. 

FIG. 7 is a representation of the "movement" of pause 
times from the working set C into successive bins during 
successive iterations of the process previously described 
with reference to FIG. 7. The vertical columns under the 
Iterations heading shows the contents of the working set at 
each iteration of the process with the bolded italicized 
numbers representing the pause times that are transferred to 
the current bin during each iteration. For example, in the first 
iteration of the process, the pause times identified by the 
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bolded, italicized numbers 8, 4, and 3 are moved from the 
working set into Bin 1. Any pause time which is moved into 
a bin during a process iteration is removed from the working 
set, causing the working set to shrink with each iteration. 

5 This shrinkage is illustrated by the dashes which appear to 
the right of any pause time which was moved into a bin 
during a prior iteration of the process. 

It may be desirable to assign a unitary group coefficient to 
each of the number groups. The advantage of doing this is 

10 computational simplicity. The number of process iterations 
is limited to the number of number groups. The possible 
disadvantage of setting all group coefficients to one is that 
only the first pause time from each group will be considered 
during each process iteration. Even if the first pause time 

is cannot be packed into the current bin (because it would 
cause a bin overflow), no consideration will be given during 
the current process to using other possibly -smaller pause 
times falling within the same number group. As a result, 
some bins not be optimally packed. For example, Bins 4 and 

20 5 are packed with only ten and eight time slots, respectively, 
for the assumed pause time set and group coefficients limited 
to one. 

Assuming non-unitary group coefficients are to be used, a 
preferred coefficient value for a particular number group can 

25 be obtained in a number of ways. Since most data commu- 
nication networks exhibit the same general traffic patterns 
over extended periods of time, the preferred coefficient for 
each number group may be obtained simply by viewing the 
pause time statistics over an extended period and then 

30 assigning a "permanent" coefficient to each of the groups. 
While this approach has the advantage of simplicity, it 
clearly does not take into account dynamic changes in 
network traffic patterns. Depending on how significant 
changes are expected to be, it may be desirable to perform 

35 a histographic or other statistical analysis of network behav- 
ior on a real-time or near real-time basis and to assign group 
coefficients based on the most recent available results of 
such analysis. 

Depending on the values of the pause times in any 

40 particular working set, it may also be mathematically pos- 
sible to pack bins more fully by using a "best fit" bin packing 
algorithm. While the use of such an algorithm is within the 
scope of the invention, such algorithms may not always be 
a preferred choice since any improvement in bin packing 

45 efficiency may be outweighed by the added complexity of 
the required calculations. 

The detailed discussion of the bin packing process, while 
necessary for an understanding of the present invention, 
does not lend itself to a clear understanding of how that 

50 process actually affects network operations. Recall that each 
pause time is associated with an upstream station that is 
contributing data traffic to a congested condition at the node 
implementing the process. The objective of the congestion 
control process is to generate information that is needed to 

55 inform each such upstream station not only how long it must 
pause during the next control interval but also when that 
pause time is to begin within the control interval 

FIG. 8 represents the intended responses of the upstream 
stations to pause control messages resulting from the pro- 

60 cess. Each horizontal bar represents a single station with its 
pause time represented by an unshaded area and its trans- 
mission time(s) represented by cross-hatched bar(s). By 
definition, any source that is not paused is expected to be 
transmitting, which means that any given station may have 

65 a transmit-pause-transmit sequence during the control inter- 
val. It should be noted that the pause time for a station within 
a particular bin does not overlap the pause time for any other 
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station within the same bin. This figure illustrates the 
intended effect of the process; namely, the distribution of the 
pause times over the length of the control interval so that all 
pause time obligations are satisfied while minimizing the 
number of traffic sources that must remain silent or paused 
at any given time. 

While there have been described what are considered to 
be preferred embodiments of the present invention, varia- 
tions and modifications therein will occur to those skilled in 
the art. For example, while the invention has been presented 
in the context of scheduling pause times in order to avoid 
congestion at a particular node in a network, the invention 
can be applied to any system in which multiple systems must 
undertake some sort of data processing activity within a 
known control interval. Through the use of the invention, the 
start and the duration of the activity at each of the systems 
can be controlled. It is intended that the appended claims 
shall be construed to include the preferred embodiments and 
all such variations and modifications that fall within the true 
spirit and scope of the present invention. 

What is claimed is: 

1. In a data communication network having a plurality of 
data traffic sources interconnected by transmission links, a 
method of controlling the flow of data trafEc from said 
sources to a particular node over a predetermined control 
interval, said method comprising the steps of: 

a) determining a desired transmission pattern for each 
data traffic source during the control interval; 

b) establishing one or more bins, each bin including 
entries for one or more traffic sources, grouping the 
data transmission sources into different, non- 
overlapping groups as a function of the pause times in 
the traffic patterns, moving no more than one source 
from a particular group into a bin, said entries being 
selected so that no more than one traffic source included 
in the bin will be paused at any given time during the 
control interval and the sum of the pause times in the 
bin is no greater than the length of the control interval; 
and 

c) generating a transmission control message for each of 
the data traffic sources to cause said sources to transmit 
or pause transmission during the control interval as a 
function of information contained in the transmission 
control message. 

2. In a data communication network having a plurality of 
data traffic sources, a plurality of switching devices for 
switching data traffic provided by said sources, and a plu- 
rality of links interconnecting said switching devices, a 
method of controlling the flow of data traffic on a link, said 
method comprising the steps of: 

a) monitoring the data traffic on the link to detect the onset 
of a congestion condition; 

b) defining a set consisting of trafEc sources which should 
be paused within a control interval to avoid congestion; 

c) establishing the length of a pause time for each of the 
traffic sources in the defined set within a predetermined 
control interval by grouping the data transmission 
sources into different, non-overlapping groups as a 
function of the pause times in the trafEc patterns, 
moving no more than one source from a particular 
group into a bin; and 

d) scheduling the initialization of the established pause 
time for each traffic source in the defined set to mini- 
mize the number of traffic sources in the set that are 
paused at any given time within the control interval. 

3. In a data communication network having a plurality of 
systems capable of independently performing data process- 
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ing tasks, each of said tasks requiring a known execution 
time, a method of scheduling the start of tasks at each of said 
systems so as to enable the execution of all of said tasks 
within a predetermined control interval, said method being 
5 performed at a scheduling system connected to the systems 
at which the tasks are to be performed ans compromising the 
steps of: 

a) creating a number set consisting of the times required 
10 for execution of a set of tasks within the next control 

interval; 

b) assigning each task execution time in said number set 
into one of two or more non-overlapping number 

35 groups, each of said number groups including all task 
execution times having numeric values within a pre- 
determined range of numbers; 

c) executing a bin packing algorithm to assign said task 
execution times to a number of bins, each of said bins 

20 including no more than one task execution time from 
each of said number groups by 

i) determining whether the sum of the task execution 
times for all of the tasks exceeds the length of the 

25 control interval; 

ii) if the sum is less than or equal to the length of the 
control interval, assigning all of said task execution 
times to a single bin and proceeding to the step of 
generating start messages; 

30 ill) if the sum is greater than the length of the control 
interval, creating a packed bin by initializing a new 
empty bin, and moving a task execution time from 
each number group into the bin if the sum of the bin 
35 contents following the move will be less than or 

equal to the length of the control interval; 
iv) repeating step iii) until all of the task execution 
times have been moved into a bin, and proceeding to 
the step of generating start messages; 
40 d) generating start messages for said systems, each mes- 
sage including a time at which a system is to start 
execution of the task to be performed, the time being a 
function of the relative location of the task execution 
time for the system within its assigned bin; and 
45 e) distributing the start messages to said systems. 

4. For use in a network having a plurality of data 
processing systems, a scheduling apparatus for scheduling 
activities at said data processing systems over the duration 
5Q of a control interval having a known length, said apparatus 
comprising: 

a) means for determining the length of an appropriate 
activity period for each of the data processing systems; 

b) means for packing the activity periods for said data 
55 processing systems into one or more bins by grouping 

the data transmission sources into different, non- 
overlapping groups as a function of the pause times in 
the traffic patterns, moving no more than one source 
from a particular group into a bin so that the no overlap 
exists between any two activity periods within the same 
bin and the combined length of activity periods within 
a bin is no greater than the duration of the control 
interval; and 

65 c) means for distributing activity control messages from 
the scheduling apparatus to each of the data processing 
systems. 
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5. For use in a data communication network having a 
plurality of data sources and a plurality of devices for 
routing data traffic among the sources, a flow control appa- 
ratus located at one of the routing device comprising: 

a) a congestion monitor for detecting the onset of traffic 5 
congestion at a predetermined location in the network; 

b) a source identifier for identifying those data traffic 
sources contributing to the congestion and for including 
those sources in a set; 

c) a pause time calculator for determining an appropriate 
pause time for each of the sources in the set; 



12 

d) a bin packing component for assigning the sources in 
the set to different bins, each bin having non- 
overlapping pause times by moving no more than one 
source from a particular group into a bin and the sum 
of the pause times being no greater than the length of 
the control interval; and 

e) a scheduling element for generating source-controlling 
messages, each message specifying the start and the 
duration of the pause time for the particular source 
identified in the message. 

***** 
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